Header



Final Project





Niv Yosef

City College of New York

B2100 Empirical Research Methods

Slide 1



The Ames Housing dataset contains various features about houses in Ames, Iowa, and their sale prices. The goal of this analysis is to understand the key factors that influence house prices in this area.





Goals:


1. Explore the dataset and identify important features.
2. Analyze the correlation between different features and sale price.
3. Provide insights and visualizations to help understand the dataset better.


Slide 2

To understand the relationships between different features and sale prices, we started by calculating the correlation matrix. This heatmap shows the correlation of each feature with the sale price, helping us identify which features are most strongly associated with higher or lower prices.


Slide 3

The following boxplot chart displays the relationship between the overall quality of houses and their sale prices in the Ames housing dataset.





Notes


1. Positive Correlation:
• There is a clear positive correlation between the overall quality of houses and their sale prices. As the quality rating increases from “Very Poor” to “Very Excellent,” the median sale price also increases.

2. Median Sale Prices:
• Houses with lower quality ratings (e.g., “Very Poor,” “Poor”) have lower median sale prices.
• Houses with higher quality ratings (e.g., “Very Good,” “Excellent,” “Very Excellent”) have significantly higher median sale prices.

3. Variation in Sale Prices:
• The interquartile range (IQR), which is represented by the height of each box, indicates the variability in sale prices within each quality category. Higher quality ratings tend to have a wider range of sale prices, suggesting more variability.

Slide 4

After checking the correlation we know that square feet is the biggest driver to the house price. The following table splits the SF data into three different categories: Small<1500sqft ,2500sqft>Medium>1500sqft, and Large>2500sqft


Notes



1. Market Segmentation: The housing market is segmented based on the size of the house, with distinct price ranges for different square footage categories.

2. Demand for Space: The significant price difference between the categories indicates a strong demand for larger living spaces.

3.Luxury Segment: The presence of high-priced outliers in the large category suggests a luxury segment within the market that caters to buyers looking for premium features and larger homes.



Further Investigation:


Further analysis could explore the specific features that contribute to the higher prices of large houses (e.g., number of bedrooms, location, year built).

Slide 5

Exploring the Relationship Between Living Area, Neighborhood, and Sale Price


Notes



1.Neighborhood Influence:
- Different neighborhoods have different reaction in how SF impacts the sale price.
- Higher sale prices are observed in certain neighborhoods, indicating location as a significant factor.

2.Positive Correlation Across Neighborhoods:
- There is a general positive correlation between living area and sale price across all neighborhoods.Larger houses tend to sell for higher prices in most neighborhoods.

3.Inconsistency in Price Trends:
Some neighborhoods show a steeper increase in sale price with an increase in living area, highlighting the premium placed on larger homes in those areas.
- The inconsistency suggests that additional factors beyond living area and neighborhood influence house prices.

Slide 6

Clustering Analysis of Ames Housing Data by Living Area and Sale Price


Notes


1. Cluster Characteristics:

- Cluster 1 (Red): Represents homes with moderate living areas and sale prices. These houses might be in average neighborhoods, offering a balance between size and affordability.
- Cluster 2 (Green): Likely consists of smaller homes with lower overall quality and fewer total rooms, leading to lower sale prices. These houses might be in less desirable neighborhoods or older areas.
- Cluster 3 (Blue): Comprises larger homes with higher sale prices, indicating premium features such as newer construction, higher quality, and more desirable neighborhoods.


2. Influence of Neighborhood and Features:


The clustering model considers multiple variables such as SF, Overall_Qual, Year_Built, TotRms_AbvGrd, and Neighborhood. Higher quality and newer homes in desirable neighborhoods tend to cluster together, while older, smaller, and lower-quality homes form separate clusters.

3. Market Segmentation:


The identified clusters reflect different segments of the housing market.
- Cluster 1 represents entry-level housing
- Cluster 2 represents mid-range homes
- Cluster 3 represents high-end properties.
This segmentation helps in understanding the distribution and pricing strategies within the housing market in Ames, Iowa.